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Abstract 

Social scientists have shown an increasing inter- 
est in understanding the structure of knowledge 
communities, and particularly the organization of 
"epistemic communities", that is groups of agents 
sharing common knowledge concerns. However, most 
existing approaches are based only on either social 
relationships or semantic similarity, while there has 
been roughly no attempt to link social and semantic 
aspects. In this paper, we introduce a formal frame- 
work addressing this issue and propose a method 
based on Galois lattices (or concept lattices) for 
categorizing epistemic communities in an automated 
and hierarchically structured fashion. Suggesting that 
our process allows us to rebuild a whole community 
structure and taxonomy, and notably fields and 
subfields gathering a certain proportion of agents, 
we eventually apply it to empirical data to exhibit 
these alleged structural properties, and successfully 
compare our results with categories spontaneously 
given by domain experts. 

Keywords: Social complex systems, Commu- 
nity representation and categorization, Scientomet- 
rics, Applied epistemology, Knowledge discovery in 
databases. 



Introduction 

There has been recently an increasing interest from 
social scientists for methods of knowledge commu- 
nity analysis and particularly to understand their 
structure. To this end, several conceptual frame- 
works as well as automated processes have been 
proposed for finding groups of agents or documents 
related by common concepts or concerns, notably 
in mathematical sociology [31 1281 |2T?| , scientomet- 
rics and knowledge discovery in databases (KDD) 



*CREA (Center for Research in Applied Epistemology), 
CNRS/Ecole Polytechnique, 1 rue Descartes, 75005 Paris, 
France. Corresponding author: roth@poly.polytechnique.fr 



The focus is often on scientific communities as 
a large amount of data available, and in particu- 
lar and among others on biologist communities — 
biology is a domain where the need for such tech- 
niques is the most pressing since article production 
rate is currently so high that it is hard for scientists 
to know their community extent and to keep track 
of its evolution. In this view, it is of utmost inter- 
est to propose tools enabling agents to understand 
the structure and the activity of the community 
of knowledge they are member of, also called epis- 
temic community. Existing approaches in com- 
munity finding are either based only on social re- 
lationships, with community extraction methods 
stemming from graph theory applied to social net- 
works EH- or based only on semantic simi- 
larity, namely clustering methods applied to doc- 
ument databases where each document is consid- 
ered as a vector in a semantic space |2*%) . 

However, there has been roughly no attempt to 
link social and semantic aspects, while the vari- 
ous characterizations of an epistemic community 
0] El E] insist on the fact that such a community 
is a group of agents who share and are working on 
a given subset of concepts, thus suggesting that 
we absolutely need to take into account this du- 
ality, that is, that it is made of agents and com- 
mon interests — agents having common interests. 
In this paper, we give a formal framework for de- 
scribing epistemic communities and then, we pro- 
pose a method using Galois lattices pQ as well as 
relevant criteria for categorizing these communi- 
ties in an automated and hierarchically structured 
fashion. Suggesting that our process allows us to 
rebuild a whole community structure and taxon- 
omy, we eventually apply it to empirical data and 
eventually compare our results with the expected 
categories spontaneously given by domain experts. 

Our main source of data is MedLine, a 
database maintained by the US National Library 
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of Medicine and containing more than 11 million 
references to health sciences articles published in 
about 3,700 journals worldwide. Besides, we nar- 
row our study to articles dealing with the zebrafish, 
a fish whose embryo is translucid and fast devel- 
oping, therefore widely used as a model animal by 
embryologists. 

1 Epistemic communities 
1.1 Rationales 

Several works stemming from social epistomology 
to political science and economics have given an 
account of the collaboration of agents within the 
same epistemic framework and towards a given 
knowledge-related goal (namely knowledge cre- 
ation or validation) within what is also called 
an epistemic community. For social epistemolo- 
gists, it is a scientist group producing knowledge 
and recognizing a given set of conceptual tools 
and representations — the "paradigm" , accord- 
ing to Kuhn |22| — possibly working in a dis- 
tributed manner on specialized tasks Con- 
sidering a whole knowledge field as a huge epis- 
temic community (e.g. biology, linguistics), one 
can see subdisciplines as smaller embedded and 
more specific epistemic communities, being sub- 
fields within a paradigm. Haas ^3] introduced 
the notion of epistemic community as "a network 
of knowledge-based experts (...) with an author- 
itative claim to policy-relevant knowledge within 
the domain of their expertise". Cowan, David and 
Foray .4< added to this definition the fact that an 
epistemic community must share a subset of con- 
cepts. In particular, an epistemic community is "a 
group of agents working on a commonly acknowl- 
edged subset of knowledge issues and who at the 
very least accept a commonly understood procedu- 
ral authority as essential to the success of their 
knowledge activities'' '. The "common concern" as- 
pect has been emphasized by Dupouet, Cohendet 
and Creplet |H] who define an epistemic commu- 
nity as "a group of agents sharing a common goal 
of knowledge creation and a common framework 
allowing to understand this trend". These authors 
nevertheless acknowledge the need of a notion of 
authority and deference. 

In the prospect of knowing which agents share 
the same concerns and work on the same concepts, 
and which these concerns or concepts are, we are 



farther from the epistemological point of view and 
need not characterize authoritative groups and 
their role. Hence, the previous definitions seem 
to be too precise in respect of authoritative and 
normative properties whereas they lack the ability 
to formalize accurately community boundaries and 
extents. Obviously such a community of knowl- 
edge should not necessarily be socially linked: it 
needs for instance neither be a real department 
nor a group of research. The definition must also 
allow some flexibility in the sense that an agent 
(or a concept) can belong to several communities. 
We keep the idea of having common "knowledge 
issues" , while we add maximality to our definition: 

Definition EC-1 (Epistemic community). 

Given a set of agents S and considering the con- 
cepts they have in common, the epistemic commu- 
nity of S is the largest set of agents who also share 
these concepts. 

This conception is to be compared with the no- 
tion of structural equivalence introduced in sociol- 
ogy by F. Lorrain and H. White [2U for describing 
a community as a group of people related in an 
identical manner to a set of other people - when 
extending this notion to a group of people related 
identically to the same concept set. 

Definition EC-Q is based on an agent set, and 
we could actually define correspondingly an epis- 
temic community by starting from a given set of 
concepts, i.e. define it as the set of concepts which 
are at least used by the very agents that were us- 
ing this given concept set. For the sake of clarity 
however, in the following section, we will at first 
focus on agent-based epistemic communities, keep- 
ing in mind that concept-based notions are defined 
strictly equivalently and in a dual manner (see def. 
El below). 

1.2 Definitions 

This being granted, we introduce from here a for- 
mal framework allowing to work on these notions. 
We present first the following basic definitions: 

Definition 1 (Intent). The intent of a set of 
agents S is the set of concepts which are used by 
every agent in S . 

Definition 2 (Epistemic group). An epistemic 
group is a set of agents provided with its intent, 
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Figure 1: Sample community, and relations between 
agents A, B, C, D and concepts linguistics (Lng) and 
neuroscience (NS) (dashed lines). 

i.e. a group of agents and the concepts they have 
in common. 

Consider for instance that agents A, B and 
C work on "linguistics" (Lng), while "neuro- 
science" (NS) is being used by B, C and D (fig. 
P). Therefore, the intent of {A,B} is {Lng}, 
that of {B,C,D} is {NS} and that of {B,C} is 
{Lng,NS}. Some epistemic groups of this exam- 
pie are thus ({A,B};{Lng}), ({B,C}; {Lng,NS}) 
and ({A,D};{0}). 

If we consider a given set of agents S - notably, 
a group of agents prototypic of a field - willing to 
know their epistemic community comes to identi- 
fying the greatest group of people who share the 
same knowledge issues as these agents (a group 
which thereby includes these agents). 

Definition 3 (Hierarchy, maximality) . An 

epistemic group is greater than another epistemic 
group if and only if (i) their intents are the same 
and (ii) the agent set of the former contains that 
of the latter. 

An epistemic group is said maximal if there ex- 
ists no greater epistemic group. 

This statement allows us not only to compare 
epistemic groups but also and more significantly 
to extend a given epistemic group to its maximal 
social size. Interpreting definition EC-Q given in 
section fTTTl within this framework leads now to the 
following definition: 

Definition EC-2 (Epistemic community). 

The epistemic community based on a given agent 
set is the corresponding maximal epistemic group. 



The epistemic community based on, for instance, 
{D} is thus ({B,C,D};{NS}), and the one based 
on {A}, {A,B}, or {A,B,C} is ({A,B,C};{Lng}). 1 
Henceforth, with this understanding the use of 
relation between the set of agents and the set of 
concepts is sufficient to capture and describe the 
underlying epistemic communities of a given sci- 
entific field. By introducing an algebraic structure 
particularly appropriate for this purpose, Galois 
lattices, we offer moreover a method for represent- 
ing and hierarchically grouping agents and con- 
cepts they use, which we ultimately wish to prove 
very relevant for epistemic community categoriza- 
tion. Before doing so, we quickly introduce below 
the concept-based notions, defined symmetrically 
to the agent-based notions: 

Definition 4 (Extent, concept-based no- 
tions). The extent of a set of concepts C is the 
set of agents using at least every concept in C . A 
concept-based epistemic group is a set of concepts 
provided with its extent. A concept-based epistemic 
group is greater than another one if and only if 
(i) their extent are the same and (ii) the concept 
set of the former contains that of the latter. A 
concept-based epistemic community is a maximal 
concept-based epistemic group. 

1.3 Galois lattices (GL) 

Broadly speaking, using Galois lattices is possi- 
ble whenever there is a relation between two sets, 
which are usually a set of objects and a set of prop- 
erties. GL is suitable for representing and order- 
ing abstract categories relying on such a relation, 
and it is therefore being widely used in conceptual 
knowledge systems |3E] and formal concept classi- 
fication HI]. 2 In this view, considering agents as 
objects and concepts as properties, GL will prove 
to be an efficient tool to describe mathematically 
the notions presented above. 

Before constructing a GL we need what we call 
a "pre-Galois structure". Given two finite sets S 
and C between which we have a binary relation 

lr The epistemic community based on {B} or {C} is how- 
ever ({B,C};{Lng,NS}); this accounts notably for the fact 
that B can belong both to a generic community and to a 
more specific or multidisciplinary community: ({B};{Lng}) 
vs. ({B,C};{Lng,NS}) - see section \2. 21 for more details. 

2 As Wille points out 1381 . GLs give a robust formaliza- 
tion of the philosophical apprehension of an abstract notion, 
characterized by its extent (physical implementation) and 
its intent (properties or internal content). 
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Figure 2: Extended sample community, with agents 
A, B, C, D and E and concepts Lng, NS, prosody (prs), 
relevance (rlv), imagery (img) and psychology (psy). 



R C S x C, we introduce two operators "A" and 
"★" such that for any subset ICS (resp. FCC), 
X A (resp. y*) is the set of elements of C (resp. 
5) related through R to every element of X (resp. 
Y), namely: 3 



Y A = { y G C | Vrr G X, } 
y* = { a; G 5 | Vy G y xRy } 



(la) 
(lb) 



Interpreting preceding definitions Defini- 
tions m and 01 get a clear interpretation here: 
if X is a set of agents, Y A denotes obviously 
its intent. Similarly if Y is a concept set, Y* 
is its extent. Thus, epistemic groups are cou- 
ples of kind (X, X A ) or (Y*, Y). It is also worth 
noting that ICX'4> AT' A C AT A (expressing the 
fact that the intent of a bigger agent set is smaller 
- the more numerous they are, the less they share) 
and that (X U X') A = X A n AT' A (i.e. the intent of 
two agent sets is the intersection of their respective 
intents - a group of agents has in common what 
its individuals share...). On the more substantial 
sample community described on fig. [21 we have for 
instance {A,C} A ={Lng} and {NS,prs}*={C}. 

Moreover, if we take the extent X A * of an intent 
X A , that is, apply * to A, we get all the agents 
who use the same concepts that were common to 
the agents of X (hence the largest agent set). In 



fact, according to definitions EC-jl] and EC-|21we 

have: 

Proposition 1. (X A * , X A ) is the epistemic com- 
munity based on X . 4 

All these properties are similar and in fact dual if 
we consider Y, * and Y* A . 

GL and epistemic communities Besides, 
the operation "A*" is a closure operation 0, 
in that it is (i) extensive (the closure is 
never smaller, X C X A *), (ii) idempotent (ap- 
plying A* more than once does not change 
the closure, (X A *) A * = X A *) and (iii) increas- 
ing (the closure of a smaller set is smaller, 
X C X' X A * C X' A *). We say that X (resp. 
Y) is a closed subset if X A * = X (resp. Y = Y* A ). 
Given two subsets X C S and Y C C, a couple 
(X,Y) is said to be closed (or complete) if and 
only if y = X A and X = Y*. This very notion is 
at the core of the Galois lattice definition . 

Definition 5 (GL). Given a relation R between 
two finite sets S and C, the Galois lattice Gs,c,R is 
the set of every closed couple (X, Y) C SxC under 
relation R. Thus, G s ,c,R = {(X A *,X A )\X C S}. 

Yet such a closed couple is actually an epistemic 
group (X, X A ) where AT A * = X. Closed couples 
correspond obviously to epistemic groups closed 
under A*, and therefore it follows: 



Proposition 2. 

community. 



A closed couple is an epistemic 



This yields the fundamental property that the 
GL is exactly the set of epistemic communities (a 
graphical representation of a GL is drawn on fig. |3] 
from the sample community of fig. 

2 Community categorization 

2.1 Community structure rebuild- 
ing 

Nonetheless, if a GL contains all epistemic com- 
munities, it is still unsure whether this tool itself 



3 By definition we set (0) A = C and (0)* = S. 



4 Indccd, (i) X A * has the same intent as X and (ii) it 
is the largest agent set enjoying this property. Proof: (i) 
comes from ((X A )*) A = X A 0; (ii) is proved by tak- 
ing X' D X A * with X ,A = X A * A , so that {x} C X' =>• 
{x} A D X' A => {x} A D X A * A => {x} A * C X A *, but 
{x} C M A * {x} C X A *, hence X' C X A * □ 
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is meaningful or not as regards a community de- 
scription task, that is, whether a GL is able to 
capture and reveal a given community's structure 
from data describing links between agents and con- 
cepts. The present section will be devoted to argu- 
ing why it can be used as such a tool. In particular, 
there are several stylized facts regarding the under- 
lying community structuration we would like GLs 
to rebuild, primarily the existence of subfields and 
significant groups of agents working within those 
subfields. Assuming a certain organization of sci- 
entific communities, the cornerstone of the justi- 
fication of our utilization of this method will lie 
(i) in the fact that it does partition a field into 
various smaller subfields corresponding to actual 
scientific communities, and (ii) eventually in the 
agreement between epistemic communities rebuilt 
by GLs and those explicitly given by domain ex- 
perts. 

Existing approaches Community and group 
detection has been for a long time under study 
in both computer science (graph theory as well 
as artificial intelligence) and sociology. Clustering 
methods (CM) originating from computer science 
tend either to use graph theory and then propose 
algorithms to partition graphs in a number of clus- 
ters fixed a priori or not (such as spectral bisec- 
tion or Kernighan-Lin algorithm |29|). or to con- 
sider object properties as multi-dimensional vec- 
tor and endeavor to grouping objects according 
to their relative similarity (such as k-means |15j . 
probabilistic neural networks |Mfi| . Kohonen maps 
|2T])- similarity measures being mostly euclidian 
distance-based. Nevertheless, the main disadvan- 
tage of these methods lies in the delicate justifi- 
cation of their relevance for social science: they 
eventually produce clusters for which it is hard to 
tell the connection with actual sociological com- 
munities. 

Approaches from sociology on the contrary in- 
troduce hypothesis and tools proper to social net- 
works (like centrality [Hj or structural equivalence 
26;) yielding thus CMs more adequate to social 
group detection than generic computer science 
methods, for instance hierarchical clustering 
blockmodeling [2], structural balance j> or, more 
recently, structural cohesion and k-components 
|28|. and Girvan-Newman algorithm and its im- 
provement by Radicchi |31j . 

Galois lattice theory offers a convenient way to 



group agents with respect to concepts they share, 
and in this sense, it is yet another CM. Some ap- 
plications of GL to social networks had also al- 
ready been explored, for instance by L. Freeman 
and D. White [Jj who actually apply GLs to agents 
and social events they attend in order to describe 
"event categories" . It is however not fortuitous 
to show why this very method is precisely rele- 
vant for achieving epistemic community descrip- 
tion and categorization: in particular, for agent 
and concept sets large enough, a GL will contain 
really a lot of epistemic communities, with agents 
belonging to many communities with various levels 
of specificity. 

2.2 Epistemic community struc- 
turation 

About relevant categorization Let us first 
examine what CMs can reveal about data: from 
any input set of objects provided with attributes, 
CMs are designed to produce an output, namely 
clusters of objects. However, CMs propose a 
grouping even when the data is a total random 
set of objects having almost no attribute in com- 
mon, data for which any clustering would in fact be 
meaningless or at least irrelevant for the purpose of 
the study. One can try for instance sorting objects 
from a yard sale, e.g. according to their size and 
value: certainly clustering algorithms give results, 
though these results are very unlikely to represent, 
say, functional categories. To be relevant, the use 
of CMs needs to be guided by particular assump- 
tions about the data structure: a necessary as- 
sumption is obviously that it does at least exhibit 
a clustered structure. In other words, it is neces- 
sary to inquire and specify what a given CM aims 
to rebuild: it would be very imprudent to trust 
its output without having checked its adequacy to 
data and defined what really constitutes a cluster, 
or a community, relatively to the data. In this 
view, both the choice of the CM and the choice of 
attributes (labelling of data) are decisive. 5 

5 One might thus distinguish (i) labelling irrelevant for 
the kind of data studied, while using a relevant CM; from 
(ii) CM irrelevant for the kind of data studied, however la- 
belled relevantly. Take for instance a linguist who would 
like to group the words light, dark, holy and evil as re- 
gards their semantic field. He might consider two criteria: 
brightness and goodness, and select e.g. the following nu- 
merical representations: light: +5 (brightness), +1 (good- 
ness); dark: -5, -1; holy: +1, +5; evil: -1, -5. For sure an 
irrelevant labelling, i.e. a bad choice in the previous cri- 
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The same goes with Galois lattices: one can 
draw a GL from any two sets of objects and a 
given relation between them, but there is no rea- 
son a priori that the lattice reveals a remarkable 
structure, even if it is built, represented or man- 
aged efficiently. In fact, there should exist a lot of 
data for which this categorization is just not rele- 
vant. Thus, in order to know whether and why GL 
is an appropriate CM for producing a taxonomy 
of knowledge communities, it is first necessary to 
inquire the nature and organization of these very 
communities. 

Assumptions Our main assumption is that 
there are fields of knowledge which can be de- 
scribed by concept lists (relevant labelling), and 
which are being implemented by sets of agents. 
Taking again the first example, some people are 
obviously linguists: among them, some deal with 
a given aspect, say prosody, while others study 
relevance; some other scientists deal with neuro- 
science, while a few of them are interdisciplinary 
and use both concepts. Knowledge fields and their 
corresponding agent sets are in our case epistemic 
communities, which are precisely what GLs con- 
sist of (see prop. EJ). Moreover and also crucial, 
these fields are hierarchically organized: (i) a gen- 
eral field can be divided into many subfields, them- 
selves possibly having subcategories or belonging 
to various general fields, and (ii) some fields can 
be multi- disciplinary or inter- disciplinary in that 
they respectively involve or integrate two or more 
subfields [201 - F° r instance, cognitive science is 
a general field gathering various subfields such as 
cognitive linguistics and cognitive neuroscience, 
thus being multidisciplinary. But the very sub- 
field cognitive neurolinguistics is interdisciplinary 
in that it mixes and coordinates the approaches 
from both parent disciplines. 

GL acute relevance as regards these properties 
results actually from its natural partial order C 
defined such that given two epistemic communi- 
ties (or closed couples) c = (X, X A ) and c' — 
(X', X' A ), we have c C c' & X C X'. This partial 
order indeed makes Qs,c.R be a lattice, hence en- 

teria (say, choosing the number of vowels and the number 
of consonants) would obviously give him a meaningless re- 
sult. But an irrelevant clustering method, e.g. based on 
euclidian distances, would also give him inconsistent out- 
put in grouping light with holy, and dark with evil, while 
he wanted light with dark, and holy with evil. 
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Figure 3: Galois lattice of the extended sam- 
ple community (hierarchical structure drawn in solid 
lines relatively to C, i.e. "bottom" C "top"). The 
medium level (dashed ellipse) contains closed cou- 
ples ({A,B,C,E};{Lng}) and ({B,C,D};{NS}) obvi- 
ously corresponding to major fields (linguistics & 
neuroscience). Hierarchy yields just below inter- 
esting subcommunities like ({D};{NS,img,psy}) or 
({B,C};{Lng,NS}), possibly prototypical of more spe- 
cific subfields. 



joying a hierarchical structure. 6 More precisely, 
the order reflects a generalization/specialization 
relation, in the sense that c C d means that c 
has a smaller extent and a greater intent than c', c 
represents a smaller community dealing with a big- 
ger concept set than c', c being thus more specific. 
This hierarchy describes exactly relations between 
fields and subfields as discussed in the previous 
paragraph (fig. as well as multidisciplinarity 
and interdisciplinarity through particular patterns 
called diamonds (fig. 0J. 

2.3 GL and categorization 

Given their hierarchical structure, GLs are thus a 
relevant method to list and order epistemic com- 
munities and subcommunities. However, it is still 
unclear why a GL, which is an ordered although 
possibly huge set of epistemic communities, will 
produce an useful and usable categorization of the 
community under study. A GL contains indeed 
all epistemic communities, a property already re- 
strictive since agent or concept sets whose intent 

6 A lattice is a partially-ordered set such that any subset 
has a least upper bound and a greatest lower bound — obvi- 
ously a finite partially-ordered set is a lattice. Note that the 
hierarchy here has nothing to do with the one introduced 
in def. 
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Figure 4: Zoom on fig. [3] showing one possible 
diamond. A multidisciplinary field is at the diamond's 
top (here "0", which relatively to the context can be 
considered as "cognitive science") and covers the two 
intermediate subfields (cognitive linguistics and cogni- 
tive neuroscience) , which themselves, when combined, 
define an interdisciplinary subfield (cognitive neurolin- 
guistics). 



or extent is (i.e. they have nothing or nobody 
in common) , or more generally is not "closed" , are 
no epistemic communities and hence do not appear 
in GL. However, many real epistemic communities 
are still of no interest - in that they do not corre- 
spond to an existing or relevant field of knowledge 
- because for instance they are too small and/or 
too specific. In particular, for a single scientist 
{s}, the closure {s} A * will admittedly be equal to 
{s}, since there are strong chances that no other 
scientist uses at least the same concepts as s - 
to some extent s is "original" . Certainly knowing 
that ({s}, {s} A ) is an epistemic community is not 
very enlightening. If however we consider that s 
is working on a field F (i.e. F C {s} A ), when 
adding more and more agents working on F to 
{s}, as the cardinal of this agent set S increases 
there are more and more chances that its (decreas- 
ing) extent S* A reaches the actual knowledge field 
F. The intent S A * will be at this point the whole 
community working on F: there will thus be a gap 
between the small uninteresting epistemic commu- 
nities reached hitherto, and the suddenly emerg- 
ing epistemic community (S A * , S A = F). In other 
words, we conjecture that there is a relevant level 
for which closed sets S^*, and identically C* A , 
are representative of a field or a trend. This also 
means that some epistemic communities listed by 
GLs are deemed to be prototypic of these fields. 
They are located between the whole agent set (ob- 
viously too general) and too specific communities, 
that is, at a medium-level of generality which is to 
be compared to Rosch's basic-level of categoriza- 



tion |3"H] . 

Given these assumptions, Qs,c,R is expected to 
exhibit significant structural properties - as re- 
gards e.g. highly-populated communities, for there 
will be aggregate of agents around some precise 
fields (i.e. epistemic communities with high-size 
agent set will prevail). These properties, once 
identified, could help design criteria for detecting 
in a somewhat automated manner major trends 
(basic-level categories) within a more general field, 
therefore making GL a powerful categorization 
tool. This idea had been introduced by the present 
authors in a previous paper |34j , now we will bring 
in section[3]empirical evidence to support this con- 
jecture. 

Comparison with existing approaches In 

general, existing studies like those mentionned at 
the beginning of this section attempt to infer com- 
munities from a very general point of view (in 
that there is no particular assumption on the na- 
ture of the social groups that these CMs are sup- 
posed to extract from data), and still focus and 
rely only on single networks of social relationships 
(e.g. coauthorship) that may prove to be insuf- 
ficient and inefficient in order to find epistemic 
communities which, as we said before, are not nec- 
essarily socially linked. Data duality brought by 
the reciprocal linkage of agents to concepts and the 
corresponding symmetry between agent-based and 
concept-based notions (def. 1, 2, 3 and EC-2 vs. 
def. 4) is moreover particularly well rendered by a 
GL, being a hierarchy of closed couples considered 
indifferently as agent sets or as concept sets. 

It is also worth noting that some of these meth- 
ods produce hierarchically structured clusters (e.g. 
hierarchical clustering and structural cohesion) 
which seem to be close to GL hierarchical represen- 
tation are in fact more or less dendrograms. Yet, a 
dendrogram is a tree whereas a GL is a lattice, i.e. 
a generalization of trees where ascendancies can 
be multiple: a community is not bound to be em- 
bedded into a lineage of increasing communities, it 
can have ascendancies in various "directions" ; in 
other words, an agent can be part of many non- 
embedded communities, he can be to some extent 
"pluridisciplinary" . 

GLs are hence a particularly adapted CM for the 
very prospect of building knowledge community 
taxonomy. Moreover, although GLs are within 
this paper principally applied to scientific com- 
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munities, we could yet easily apply it to other 
spheres like for instance economic communities, 
where companies deal with sets of technologies. 

3 Empirical results 

3.1 Experimental protocol 

To lead our experiments on scientific communities, 
we need data stipulating which agents use which 
concepts. We consider article collections, assum- 
ing that articles are a faithful account what their 
authors are working on. However, an important 
point is now to define precisely what a concept is, 
and in particular what is a concept such that we 
can observe its appearance in an article. This no- 
tion needs not be too precise nor too wide. Is it a 
paradigm like "universal gravitation" or a simple 
word like "operon"! For instance, authors provide 
their articles with keywords: apparently, consider- 
ing these keywords as concepts seems to constitute 
a relevant level of categorization while being a con- 
venient idea. Yet, such keywords have not proven 
to be very reliable indicators of the issues articles 
are dealing with, for authors often omit important 
keywords or specify poorly relevant ones; depend- 
ing on the database, keywords for the same article 
can strongly differ, requiring the additional help 
of an expert ontology. 

Word groups as concepts Getting concepts 
through words and nominal groups (terms) from 
article title, abstract or body appears to be a safer 
method than using keywords. At first we will thus 
say that each word or nominal group is a concept 
even if we are still hampered by linguistic phenom- 
ena like homonymy, polysemia, synonymy |17j . 
syllepsis an d the fact that different authors 
might have different definitions of the same word 
or understand different concepts under an iden- 
tical nominal group Some techniques have 
been proposed (see e.g. [37]) and could be used 
to solve these problems and determine the con- 
textual meaning of nominal groups, this is how- 
ever not the purpose of the present article and 
we will assume here that nominal groups repre- 
sent sufficiently distinguishable and homogenous 
references to concepts. Additionally, this defini- 
tion does not prevent us from observing higher- 
level concepts such as theories or even paradigms, 
since we can easily refer to these concepts a poste- 



riori by considering sets of words, like for example 
interpreting { "cell", "DNA ", "gene", "genetics", 
"molecular"} as molecular biology. 

We will also only proceed with title and ab- 
stract words, first because complete article con- 
tents are rarely available on an exhaustive basis 
(that is, exhaustively available for a whole commu- 
nity), and second because it could imply to take 
into account too many very precise though irrele- 
vant words (thus dramatically increasing set sizes 
while massively introducing noise). 

Data processing The data presented here has 
been processed according to the following method- 
ology: 

1. Collect and automatically process article data 
(title, abstract, authors) for a given commu- 
nity and period of time. As regards abstract 
and title, we apply a very basic linguistic pro- 
cessing (though a good tradeoff between com- 
plexity and efficiency) consisting in: 

• Excluding unsignificant words (stop- 
words), such as common and rhetori- 
cal english words ( "often" , "then" , "we" , 
etc.) and irrelevant words in respect 
of the domain ( "demonstrate" , "postu- 
late", "specimen", "study", etc.), using 
a list of more than 2,500 words, to which 
we add non-words such as figures, per- 
centages, dates, etc. 

• Excluding rare words, i.e. words appear- 
ing n times or less in the whole corpus 
(such as words appearing only once, also 
called hapax legomena or hapaxes). In 
our case, we took n — 4. 

• Stemming the remaining words, i.e. re- 
ducing morphological variants of words 
to their stem (root form) using a slightly 
improved version of Porter's stemming 
algorithm [3U], and then creating the 
corresponding word classes (for example, 
"genetic" and "genetics" both reduce to 
"genet"). 

2. Identify unique authors and unique words, 
and then create the weighted matrix M of 
links between authors and words, where My 
is equal to the number of articles where au- 
thor i used concept j (see fig'EJ)- 
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Figure 5: Experimental protocol: step 1 and 2 
help create the core network, and the correspond- 
ing relation weighted matrix shown here (authors on 
rows, concepts on columns) . Some agents are removed 
through step 3. The GL is then computed from the 
binary matrix obtained after step 4. 



3. Keep randomly a given fraction of authors, 
that is, consider a representative sample of 
the whole community by extracting randomly 
and uniformly some lines from matrix M. We 
chose to keep each line with probability .25 
(this step aims only at GL reducing compu- 
tation time) 



4. Make M a binary matrix relatively to a given 
threshold a, i.e. replace Mij by if My < a, 
else by 1: this means that an author will not 
be related to a concept he used less than a 
times. We actually used a threshold of 1 (in- 
creasing the threshold would critically reduce 
both computation costs and results signifi- 
cance). 



5. Calculate the Galois lattice for the binary re- 
lation matrix M, using an implementation of 
Ganter's algorithm Pl2*5]. 



3.2 Results and comparison with 
random relations 

We ran the process on articles published between 
1990 and 1995 obtained through a search for "ze- 
brafish" on the MedLine database, totalling 418 
articles and mentioning 797 authors and 2129 
words after step [2] of the protocol. After step 02 
only 218 authors and consequently 1817 concepts 
remained in M. This is the relation matrix we 
used for computing the GL (steps 0] and [5J . 

We noticed unsurprisingly that some authors 
and concepts were appearing significantly more 
frequently than others. More precisely, there was a 
particular distribution of links from agents to con- 
cepts (proportion of agents being related to a given 
number of concepts) and from concepts to agents: 
a lot of agents (resp. concepts) were linked to few 
concepts (resp. agents) while few agents/concepts 
were related to many concepts/ agents. For this 
reason, we could fear GL artefacts since frequent 
authors or frequent concepts are more likely to 
share or respectively be shared by more concepts 
or agents, thus being part of bigger closed sets 
and increasing the number of these big sets, even- 
tually modifying artificially the GL structure, es- 
pecially high-size closed sets. We hence decided 
to compare our results with those from GLs calcu- 
lated with random-generated relations where this 
exact property of the empirical data was kept. In 
other words, we kept the distributions of links on 
rows and columns in the relation matrix from step 
13 while we reshuffled the links themselves, using 
an algorithm introduced by Molloy & Reed |27j. 7 
From now on, we call "random case" the results 
obtained from computations on 40 such random 
relation matrices. 8 

Empirical vs. random In order to confirm the 
intuition that we have relatively large communi- 

7 Briefly, this algorithm consists in assigning to each au- 
thor a number of outgoing links to concepts according to the 
desired distribution, and identically assigning to each con- 
cept a number of outgoing links to authors; then matching 
randomly the dangling links between authors and concepts. 

8 We also considered two other random cases: (i) keep 
the same density in the relation (same proportion of real 
links in respect of possible links), which is approximately 
one link out of 30; and (ii) keep only the distribution of links 
from agents to concepts. Interestingly, the corresponding 
GLs are really poor: they are dramatically small, with 
16,000 epistemic communities whose sizes do not exceed 
5% of the whole community in general (see fig. |HJ. There- 
fore, these cases were not investigated further. 



9 



ties sharing concepts (prototypical of a subfield), 
we looked at the proportion of high size epistemic 
communities by drawing the distribution of agent 
set sizes. In spite of the extremely rough linguis- 100000 
tic assumptions, we get strongly significant results i(m: 
from empirical data, especially when compared to 
the random case. m> 

On the first graph (fig. EJ) we plotted the raw 
distributions of agent set sizes, i.e. the number 100 
of epistemic communities relatively to the size of 
their agent set. The empirical GL contains 214,000 
closed couples, with agent set sizes ranging from 1 i 
to 196 - admittedly excepting the epistemic com- 
munity (S,0) containing all the 218 agents under 
study - to be compared with an average of around 
207,000 closed couples in the random case (stan- 
dard deviation a ~ 64,700), with agent set sizes 
ranging only from 1 to 60 (er ~ 5). This means 
that while the empirical GL is generally approxi- 
mately the same size as random GLs, it contains 
dramatically more high-size epistemic communi- 
ties (featuring 371 communities representing more 
than a fifth of the whole agent set, when random 
GLs hardly contain a dozen such communities). 

The comparison is a bit more striking on the sec- 
ond graph (fig. UJ representing distributions nor- 
malized in respect of GL size (that is, each class 
size has been divided by the GL total size): while 
there is a quite perfect fit on the density of low-size 
closed couples, the empirical GL is comparatively 
dramatically denser on high-size couples, with a 
deviation of one order of magnitude when consid- 
ering communities with more than 20 agents, i.e. 
10% of the whole. For the purpose of underlining 
this effect, we finally considered cumulated densi- 
ties on the third graph (fig. [SJ, i.e. the propor- 
tion of closed couples containing at least a given ° 
number of agents: 1% of the GL in the empir- M 
ical case is made of epistemic communities con- 
taining 30 agents or more, versus .05% in the ran- °" 
dom case (respectively one thousandth vs. one „.„ 
thirty-thousandth for communities with 50 agents 

\ O.O0OC 

or more). 

0.0CO0C 

Rebuilding the structure High-size epistemic 
communities appear to be proper to our empiri- 
cal data, suggesting that these high-size clusters 
— that is, large groups of structurally equivalent 
agents pointing to the same groups of con- 
cepts — are a remarkable stylized fact, providing 
support to the conjecture outlined in section l2~3l 




Figure 6: Raw distributions of agent set sizes (log/lm 
graph). Abscissa: agent set sizes (percentage of the 
whole community); ordinate (log scale): number of 
corresponding epistemic communities. Circles: em- 
pirical data; triangles: random case (random data 
with same distributions, 40 computations, with stan- 
dard deviation bars). Also plotted on the left are two 
other random cases (see footnote |HJ: (i) random data 
with same link density (squares) and (ii) random data 
with same distribution from agents to concepts only 
(crosses) . 




Figure 71 Normalized distributions of agent set sizes 
(log/lin graph). Abscissa: agent set sizes; ordinate (log 
scale) : percentage of epistemic communities of a given 
size within the whole GL. Circles: empirical data; tri- 
angles: random case. 
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Figure 8: Cumulated densities (frequencies) of agent 
set sizes. Abscissa: agent set sizes; ordinate: per- 
centage of epistemic communities containing at least 
x agents. Dark circles: empirical data; grey triangles: 
random case. 

Nonetheless, it is also of great interest to know 
whether these communities are significant and rel- 
evant, and notably if they help partition a field 
into various smaller subfields corresponding to real 
epistemic communities - a stylized fact as much 
crucial for the justification of the utilization of this 
very CM. 

With the help of a zebrafish expert, Nadine 
Peyrieras, we observed that it was actually the 
case: 

(i) The first and biggest community is un- 
surprisingly centered around the word "ze- 
brafish" and contains 196 agents (90% of the 
whole). The fact that it does not reach 100% 
of the community as one would expect re- 
flects the imperfection of the empirical data 
collection and processing. 

(ii) Then, a lot of large epistemic communities is 
revolving around a small set of words, namely 
"gene", "expression", "pattern", "embryo", 
"develop" and "vertebrate", that is, their in- 
tents are a combination of some of these 
words while their extents contain generally 
around 100 agents. In fact, a large majority 
of the 218 agents are present in at least one 
of these communities; this word set seems ac- 
cordingly to characterize the core paradigm 
of zebrafish researchers (even if each agent 
does not use it wholly, which is credible if we 
consider that in the relatively few article ab- 
stracts present in the database most authors 
might have not cited every word of this word 
set but only a partial subset). According to 



our expert and the litterature [T5], the ze- 
brafish is indeed being used as a vertebrate 
animal model for the study of gene expres- 
sion and function during embryonic develop- 
ment. 

Similarly, another word subset of interest 
is made of "cloning", "stage", "transcrip- 
tion", "sequence", "protein", "region", "en- 
code", which constitute the intents of rel- 
atively high-size epistemic communities (50 
agents). According to our expert, these 
words are proper to the paradigm of molecu- 
lar biology or developmental studies in gen- 
eral, or to zebrafish study, which consists in 
isolating a large number of mutant fish lines, 
isolating the corresponding mutated genes, 
then investigating their involvement in bio- 
logical processes. So, in the search for rele- 
vant partitioning communities it is reason- 
able to ignore these too trivial thus noisy 
words and the corresponding closed sets. 

(iii) Thereafter and once these words ignored, 
some smaller and more precise communi- 
ties appear around non-paradigmatic words. 
Two major groups emerge first: (i) one with 
the epistemic community based on "growth" 
(39 agents), and (ii) the other around three 
epistemic communities whose intents are 
"neuron" (70 agents), "brain" (36 agents) 
and { "nervous", "system"} (28 agents), with 
many common agents and which altogether 
makes a group of 84 single agents. Interest- 
ingly, there are only 15 agents common to 
both communities (i) and (ii), so 108 agents 
are well divided between the two. It is not 
fortuitous to see that these groups corre- 
spond exactly to what the litterature de- 
scribes as significant subfields explicitly 9 as 
well as implicitly 10 . 

Some other much smaller communities help 

9 At the beginning of the 90's, according to Grunwald & 
Eisen [15], "among the first mutants to be isolated was one 
that was later discovered to be deficient in a growth factor 
needed for axis determination, a second deficient in my- 
ofibril organization, and a third in which a specific portion 
of its nervous system failed to form". 

10 According to the program of the first conference on 
zebrafish development and genetics at the CSH Laboratory 
in 1994, there were seven theme-based sessions, including 
two on nervous system and one on growth control - so, 
approximately, these two fields represented half the sessions 
and half the community. 
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Figure 9: Very partial view of the actual GL (which 
contains more than 200,000 closed couples) hierarchi- 
cally showing intents and extent sizes (in brackets) of 
selected epistemic communities. Note that there are 
various possible partitions of the whole agent set, de- 
pending on what one is looking at: for example ob- 
jects, processes, methods, etc. 



structuring further the field: the epistemic 
community based on { "toxicity"} is made of 
23 agents with 9 shared with "growth" and 
only three with "brain" - this group might 
be related to the study of the toxic effect 
of growth factors. The epistemic community 
based on words "acid" (45 agents) has an 
interesting descent, { "acid", "amino"} (22 
agents) and { "acid", "retino"} (21 agents), 
with only 3 agents in common in the extent of 
{"acid", "amino", "retino"}, so this is a dia- 
mond with no relation between people work- 
ing on/with amino acid and retinoic acid. 
Also, the closed couple with intent { "spinal", 
"cord"} (28 agents) includes the one based 
on { "spinal" , "cord" , "neural" , "ventral"} 
(20 agents) with almost as many agents, sug- 
gesting that (i) "spinal" and "cord" cannot 
be dissociated and (ii) people working on 
spinal cord are also very familiar with con- 
cepts "neural" and "ventral". 

All these findings are summed up on figure [5] 
and show that GLs are efficient both for determin- 
ing the community paradigm (or common back- 
ground) and for finding prevailing communities as 
well as medium-level subcommunities. A further 
study would consist in observing how the commu- 
nity evolved through the dynamics of the GL (see 
section as this embryo of partition is made 
from data of the period 1990-1995 and is supposed 
to be a static photograph of the community struc- 
ture as of December 1995, certainly appreciably 



different now for some "fashionable" subfields may 
have been abandoned while others have appeared. 



Other findings and prospects From the ran- 
dom case results we can also derive that distri- 
butions of links between agents and concepts do 
not alone account for the special embedded clus- 
tered structure we observe - this result is neither 
surprising nor new (see for instance |13|). Never- 
theless, it would be interesting to see which class 
of random relations (or random bipartite graphs 
between agents and concepts) if any can produce 
the same kind of GL as in our empirical case: other 
properties might contribute to this structure, such 
as e.g. assortativity, clustering coefficient etc. In 
other words, how does the existence of real com- 
munities actually translate in terms of properties 
in relation matrix M, apart from a given distribu- 
tion of links on rows and columns, between agents 
and concepts ? 

Moreover, these results show the usefulness of 
binding social and conceptual networks and tak- 
ing into account data from both networks, as pro- 
posed previously in |34j . since we have commu- 
nities here that are not socially linked and cer- 
tainly would have been uneasy to detect - if not 
impossible - with single-network based methods 
(namely, based on the social network): it would 
be interesting to compare GL-based communities 
with those obtained from single-network data, in 
particular, see whether a single-network commu- 
nity is included or not in a GL-community. Fi- 
nally, considering that linguistic assumptions and 
processing were very poor, these preliminary find- 
ings are also very encouraging in the prospect of 
improving both data quality and criteria for de- 
tecting communities (see section l4~2*l and . 



4 Further directions 

4.1 Dynamic community monitor- 
ing 

Having yet categorized epistemic communities on 
a static basis, it would be interesting to have an ac- 
count of their dynamics: we describe here how par- 
ticular field evolutions could translate into prop- 
erties both of epistemic communities and of the 
GL. 
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Field progress and specialization We could 
easily monitor (i) the progress or decline of a field 
characterized by a given concept set, by observing 
respectively an increase or a decrease of the corre- 
sponding agent set (i.e. a variation in the size of 
the population dealing with this concept set); and 
(ii) the specialization or generalization of an epis- 
temic community and in particular its agent set, 
by observing respectively an increase or a decrease 
of its corresponding concept set (i.e. a variation 
on the concept set this given agent set is working 
on). 

New fields Alternatively, one could monitor the 
emergence of new fields, being either entirely new 
fields, or fields stemming from already existing 
fields (namely new interdisciplinary or multidis- 
ciplinary fields). The latter is the case where dia- 
monds emerge or grow: the epistemic communities 
at the top or the bottom of a diamond are increas- 
ing in agent set size. More precisely, we distinguish 
two cases: 

(i) emergence of a new "multidisciplinc" : the 
regrouping of two existing fields under a 
more general epistemic community contain- 
ing agents from the two former fields. This 
happens when the epistemic community 
based on the union of two agent sets S1US2 is 
growing, thus having S£ PI S£ as concept set 

- in our exemple fig. 0J it would correspond 
to the growth of the "cognitive science" com- 
munity (diamond's top). 

(ii) emergence of a new "interdiscipline" : merg- 
ing of two existing fields in a more specific 
epistemic community with concepts from the 
two former fields (growth of the epistemic 
community based on the union of two con- 
cept sets C\ U C2, with C\ n C| as agent set 

- e.g. the "cognitive neurolinguistics" com- 
munity on fig. 0] i.e. diamond's bottom). 

4.2 Linguistic processing 

The improvement of linguistic processing is most 
urgent, and could first include the use of: 

• Lemmatizcrs: algorithms giving the root of a 
word, instead of using a stemmer like the one 
used here (the "Porter stemmer" , though it is 
also a quite simple yet efficient lemmatizer); 



• Taggers: algorithms detecting word gram- 
matical status in context, e.g. "subject", 
"verb", etc.; 

• Morphological analyzers: algorithms recog- 
nizing the shape of a word actually composed 
of two or more words, like "molecular biol- 
ogy", "positon emission tomography", etc.; 

• Dictionaries: ontologies of the domain, re- 
turning classes of words considered as equiv- 
alent (as stated in section |3J|, like "zebrafish" 
and "rerio brachydanio" , the former being the 
common name of the latter; 

• Disambiguators: algorithms determining the 
meaning of words by examining the context 
in which they are used [37]. 

Most of these tools already exist, although their 
joint use would require a judicious work of inte- 
gration. 

Expert-processed data Alternatively, it could 
be useful to compare these results with those from 
data processed by human experts, where all lin- 
guistic processing problems become quite obsolete. 
For instance, (i) by providing them with a fixed 
list of concepts and making them classify agents 
according to this list, or (ii) by making them iden- 
tify a restricted list of words they know to be 
sufficiently descriptive for a given set of articles 
(e.g. protein nomenclature consisting of very spe- 
cific names [24)1 . 

4.3 Community detection criteria 

The design of better criteria in order to catego- 
rize and distinguish medium-level epistemic com- 
munities is also a critical question. In this pa- 
per, we used the agent set size, which is actually 
a quite simple criterion bearing some major draw- 
backs, such as the fact that small communities are 
ignored, even if they correspond to well-defined 
though isolated fields. In this respect, taking the 
communities which are close to the top (also called 
anti- chains) can prove more relevant for they are 
just more specific than the whole community, ob- 
viously the most general epistemic community. In 
a more general view, before designing efficient cri- 
teria, it is most important to find the properties 



13 



that make an epistemic community be a "medium- 
level" community; obviously the property of gath- 
ering an important proportion of the agents is a 
good yet insufficient first estimate. Hence, a more 
detailed set of properties might for instance in- 
clude (i) distance from the top epistemic commu- 
nity, (ii) distance from the empty epistemic com- 
munity (0,C), and (iii) concept set size. 

GL handling In the prospect of making this 
method available to scientists, a complementary 
approach could be to design a software allowing 
navigation through the lattice, like for instance 
starting from the top community and progressively 
narrowing the agent set by specifying concepts 
from a list of possible choices. 

Conclusion 

In this paper we proposed a method for describ- 
ing and categorizing communities of knowledge as 
well as capturing essential stylized facts regarding 
their structure. Assuming that such communities 
are structured in fields and subfields of common 
concerns, we aimed eventually at rebuilding this 
structure and in particular at providing an accu- 
rate taxonomy by automatically partitioning the 
community into various hierarchic representative 
subfields. 

After having reviewed some definitions of knowl- 
edge communities or "epistemic communities" 
from social epistemology and economics, we in- 
troduced yet a definition that reflected the ex- 
act property of belonging to the same community 
when sharing the same concerns and working on 
the same concepts — a conception close to struc- 
tural equivalence. For a GL contains exactly all 
such epistemic communities, we showed next that 
the Galois lattice structure was a particularly ad- 
equate clustering method with respect to this def- 
inition. However, it was unclear whether this was 
sufficient to make it an useful categorizing tool in 
that the set of all epistemic communities could 
possibly prove really huge and intractable. To 
this end, we conjectured that if knowledge fields 
did indeed exist there should be a gap in agent set 
size between epistemic communities corresponding 
to real subfields and others (the former gathering 
many more agents); this first criterium will then 
have allowed us to discriminate within the lattice 



between "uninteresting" communities and signifi- 
cant ones. The lattice was thus expected to pro- 
vide the hierarchic structure we wanted to rebuild. 

Empirical results on an embryologist community 
centered around the model animal zebrafish con- 
firmed this expectation even though data quality 
was somewhat imperfect, mostly because of an ap- 
proximative linguistic processing. High-size epis- 
temic communities were significantly numerous, 
especially with respect to selected random cases, 
and we managed to reproduce a partition of the 
community (figure confirmed relevant by do- 
main experts. 

Our method diverges essentially from single- 
network-based methods using for instance rela- 
tionships or semantic proximity, for it lies on the 
very duality of epistemic communities (agents hav- 
ing common interests) - it would nevertheless be 
interestingly compared to results obtained through 
these other clustering methods. Also, it could also 
be fruitfully applied in other contexts such as the 
field of technological cooperation between com- 
panies through contracts, equivalent to authors 
working on concepts through articles. Several im- 
provements could be carried out, such as better 
linguistic processing, better criteria design, and 
better handling of the lattice. Finally, as we en- 
deavored to define, describe and hierarchize epis- 
temic communities, a further work will attempt 
to explain how we could monitor their dynamics 
and the coevolution of the social and conceptual 
structures. 
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